Serveur d'exploration sur Mozart

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Simultaneous Unsupervised Learning of Disparate Clusterings

Identifieur interne : 001143 ( Main/Exploration ); précédent : 001142; suivant : 001144

Simultaneous Unsupervised Learning of Disparate Clusterings

Auteurs : Prateek Jain [États-Unis] ; Raghu Meka [États-Unis] ; Inderjit S. Dhillon [États-Unis]

Source :

RBID : ISTEX:82B51EB4684DA3F88BC9E5B2A21E4F6ABB5A0A02

English descriptors

Abstract

Most clustering algorithms produce a single clustering for a given dataset even when the data can be clustered naturally in multiple ways. In this paper, we address the difficult problem of uncovering disparate clusterings from the data in a totally unsupervised manner. We propose two new approaches for this problem. In the first approach, we aim to find good clusterings of the data that are also decorrelated with one another. To this end, we give a new and tractable characterization of decorrelation between clusterings, and present an objective function to capture it. We provide an iterative “decorrelated” k‐means type algorithm to minimize this objective function. In the second approach, we model the data as a sum of mixtures and associate each mixture with a clustering. This approach leads us to the problem of learning a convolution of mixture distributions. Though the latter problem can be formulated as one of factorial learning 8, 13, 16, the existing formulations and methods do not perform well on many real high‐dimensional datasets. We propose a new regularized factorial‐learning framework that is more suitable for capturing the notion of disparate clusterings in modern, high‐dimensional datasets. Furthermore, we provide kernelized version of both of our algorithms. The resulting algorithms do well in uncovering multiple clusterings, and are much improved over existing methods. We evaluate our methods on two real‐world datasets—a music dataset from the text‐mining domain, and a portrait dataset from the computer‐vision domain. Our methods achieve a substantially higher accuracy than existing factorial learning as well as traditional clustering algorithms. Copyright © 2008 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 1: 000‐000, 2008

Url:
DOI: 10.1002/sam.10007


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Simultaneous Unsupervised Learning of Disparate Clusterings</title>
<author>
<name sortKey="Jain, Prateek" sort="Jain, Prateek" uniqKey="Jain P" first="Prateek" last="Jain">Prateek Jain</name>
</author>
<author>
<name sortKey="Meka, Raghu" sort="Meka, Raghu" uniqKey="Meka R" first="Raghu" last="Meka">Raghu Meka</name>
</author>
<author>
<name sortKey="Dhillon, Inderjit S" sort="Dhillon, Inderjit S" uniqKey="Dhillon I" first="Inderjit S." last="Dhillon">Inderjit S. Dhillon</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:82B51EB4684DA3F88BC9E5B2A21E4F6ABB5A0A02</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1002/sam.10007</idno>
<idno type="url">https://api.istex.fr/document/82B51EB4684DA3F88BC9E5B2A21E4F6ABB5A0A02/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">002D13</idno>
<idno type="wicri:Area/Istex/Curation">002852</idno>
<idno type="wicri:Area/Istex/Checkpoint">000B72</idno>
<idno type="wicri:doubleKey">1932-1864:2008:Jain P:simultaneous:unsupervised:learning</idno>
<idno type="wicri:Area/Main/Merge">001156</idno>
<idno type="wicri:Area/Main/Curation">001143</idno>
<idno type="wicri:Area/Main/Exploration">001143</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Simultaneous Unsupervised Learning of Disparate Clusterings</title>
<author>
<name sortKey="Jain, Prateek" sort="Jain, Prateek" uniqKey="Jain P" first="Prateek" last="Jain">Prateek Jain</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Sciences, University of Texas, Austin, TX 78712‐1188</wicri:regionArea>
<placeName>
<region type="state">Texas</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Meka, Raghu" sort="Meka, Raghu" uniqKey="Meka R" first="Raghu" last="Meka">Raghu Meka</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Sciences, University of Texas, Austin, TX 78712‐1188</wicri:regionArea>
<placeName>
<region type="state">Texas</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Dhillon, Inderjit S" sort="Dhillon, Inderjit S" uniqKey="Dhillon I" first="Inderjit S." last="Dhillon">Inderjit S. Dhillon</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Sciences, University of Texas, Austin, TX 78712‐1188</wicri:regionArea>
<placeName>
<region type="state">Texas</region>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Statistical Analysis and Data Mining</title>
<title level="j" type="abbrev">Statistical Analy Data Mining</title>
<idno type="ISSN">1932-1864</idno>
<idno type="eISSN">1932-1872</idno>
<imprint>
<publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="2008-11-25">2008-11-25</date>
<biblScope unit="volume">1</biblScope>
<biblScope unit="issue">3</biblScope>
<biblScope unit="page" from="195">195</biblScope>
<biblScope unit="page" to="210">210</biblScope>
</imprint>
<idno type="ISSN">1932-1864</idno>
</series>
<idno type="istex">82B51EB4684DA3F88BC9E5B2A21E4F6ABB5A0A02</idno>
<idno type="DOI">10.1002/sam.10007</idno>
<idno type="ArticleID">SAM10007</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1932-1864</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>disparate clustering</term>
<term>expectation maximization</term>
<term>k‐means</term>
<term>unsupervised learning</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Most clustering algorithms produce a single clustering for a given dataset even when the data can be clustered naturally in multiple ways. In this paper, we address the difficult problem of uncovering disparate clusterings from the data in a totally unsupervised manner. We propose two new approaches for this problem. In the first approach, we aim to find good clusterings of the data that are also decorrelated with one another. To this end, we give a new and tractable characterization of decorrelation between clusterings, and present an objective function to capture it. We provide an iterative “decorrelated” k‐means type algorithm to minimize this objective function. In the second approach, we model the data as a sum of mixtures and associate each mixture with a clustering. This approach leads us to the problem of learning a convolution of mixture distributions. Though the latter problem can be formulated as one of factorial learning 8, 13, 16, the existing formulations and methods do not perform well on many real high‐dimensional datasets. We propose a new regularized factorial‐learning framework that is more suitable for capturing the notion of disparate clusterings in modern, high‐dimensional datasets. Furthermore, we provide kernelized version of both of our algorithms. The resulting algorithms do well in uncovering multiple clusterings, and are much improved over existing methods. We evaluate our methods on two real‐world datasets—a music dataset from the text‐mining domain, and a portrait dataset from the computer‐vision domain. Our methods achieve a substantially higher accuracy than existing factorial learning as well as traditional clustering algorithms. Copyright © 2008 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 1: 000‐000, 2008</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Texas</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Texas">
<name sortKey="Jain, Prateek" sort="Jain, Prateek" uniqKey="Jain P" first="Prateek" last="Jain">Prateek Jain</name>
</region>
<name sortKey="Dhillon, Inderjit S" sort="Dhillon, Inderjit S" uniqKey="Dhillon I" first="Inderjit S." last="Dhillon">Inderjit S. Dhillon</name>
<name sortKey="Meka, Raghu" sort="Meka, Raghu" uniqKey="Meka R" first="Raghu" last="Meka">Raghu Meka</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Musique/explor/MozartV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001143 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001143 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Musique
   |area=    MozartV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:82B51EB4684DA3F88BC9E5B2A21E4F6ABB5A0A02
   |texte=   Simultaneous Unsupervised Learning of Disparate Clusterings
}}

Wicri

This area was generated with Dilib version V0.6.20.
Data generation: Sun Apr 10 15:06:14 2016. Site generation: Tue Feb 7 15:40:35 2023